[Next] [Art
of Assembly][Randall Hyde]
Art of Assembly Language: Chapter Eight
- Chapter Eight - MASM: Directives &
Pseudo-Opcodes
- 8.0 - Chapter Overview
- 8.1 - Assembly Language Statements
- 8.2 - The Location Counter
- 8.3 - Symbols
- 8.4 - Literal Constants
- 8.4.1 - Integer Constants
- 8.4.2 - String Constants
- 8.4.3 - Real Constants
- 8.4.4 - Text Constants
- 8.5 - Declaring Manifest Constants Using
Equates
- 8.6 - Processor Directives
- 8.7 - Procedures
- 8.8 - Segments
- 8.8.1 - Segment Names
- 8.8.2 - Segment Loading Order
- 8.8.3 - Segment Operands
- 8.8.3.1 - The ALIGN Type
- 8.8.3.2 - The COMBINE Type
- 8.8.4 - The CLASS Type
- 8.8.5 - The Read-only Operand
- 8.8.6 - The USE16, USE32, and
FLAT Options
- 8.8.7 - Typical Segment Definitions
- 8.8.8 - Why You Would Want
to Control the Loading Order
- 8.8.9 - Segment Prefixes
- 8.8.10 - Controlling Segments
with the ASSUME Directive
- 8.8.11 - Combining Segments:
The GROUP Directive
- 8.8.12 - Why Even Bother With
Segments?
- 8.9 - The END Directive
- 8.10 - Variables
- 8.11 - Label Types
- 8.11.1 - How to Give a Symbol
a Particular Type
- 8.11.2 - Label Values
- 8.11.3 - Type Conflicts
- 8.12 - Address Expressions
- 8.12.1 - Symbol Types and Addressing
Modes
- 8.12.2 - Arithmetic and Logical
Operators
- 8.12.3 - Coercion
- 8.12.4 - Type Operators
- 8.12.5 - Operator Precedence
- 8.13 - Conditional Assembly
- 8.13.1 - IF Directive
- 8.13.2 - IFE directive
- 8.13.3 - IFDEF and IFNDEF
- 8.13.4 - IFB, IFNB
- 8.13.5 - IFIDN, IFDIF, IFIDNI,
and IFDIFI
- 8.14 - Macros
- 8.14.1 - Procedural Macros
- 8.14.2 - Macros vs. 80x86
Procedures
- 8.14.3 - The LOCAL Directive
- 8.14.4 - The EXITM Directive
- 8.14.5 - Macro Parameter Expansion
and Macro Operators
- 8.14.6 - A Sample Macro to
Implement For Loops
- 8.14.7 - Macro Functions
- 8.14.8 - Predefined Macros,
Macro Functions, and Symbols
- 8.14.9 - Macros vs. Text Equates
- 8.14.10 - Macros: Good and
Bad News
- 8.15 - Repeat Operations
- 8.16 - The FOR and FORC Macro
Operations
- 8.17 - The WHILE Macro Operation
- 8.18 - Macro Parameters
- 8.19 - Controlling the Listing
- 8.19.1 - The ECHO and %OUT
Directives
- 8.19.2 - The TITLE Directive
- 8.19.3 - The SUBTTL Directive
- 8.19.4 - The PAGE Directive
- 8.19.5 - The .LIST, .NOLIST,
and .XLIST Directives
- 8.19.6 - Other Listing Directives
- 8.20 - Managing Large Programs
- 8.20.1 - The INCLUDE Directive
- 8.20.2 - The PUBLIC, EXTERN,
and EXTRN Directives
- 8.20.3 - The EXTERNDEF Directive
- 8.21 - Make Files
- 8.22 - Sample Program
- 8.22.1 - EX8.MAK
- 8.22.2 - Matrix.A
- 8.22.3 - EX8.ASM
- 8.22.4 - GETI.ASM
- 8.22.5 - GetArray.ASM
- 8.22.6 - XProduct.ASM
Copyright 1996 by Randall Hyde
All rights reserved.
Duplication other than for immediate display through a browser is prohibited
by U.S. Copyright Law.
This material is provided on-line as a beta-test of this text. It is for
the personal use of the reader only. If you are interested in using this
material as part of a course, please contact
rhyde@cs.ucr.edu
Supporting software and other materials are available via anonymous ftp
from ftp.cs.ucr.edu. See the "/pub/pc/ibmpcdir" directory for
details. You may also download the material from "Randall Hyde's Assembly
Language Page" at URL:
http://webster.ucr.edu
Notes:
This document does not contain the laboratory exercises, programming assignments,
exercises, or chapter summary. These portions were omitted for several reasons:
either they wouldn't format properly, they contained hyperlinks that were
too much work to resolve, they were under constant revision, or they were
not included for security reasons. Such omission should have very little
impact on the reader interested in learning this material or evaluating
this document.
This document was prepared using Harlequin's Web Maker 2.2 and Quadralay's
Webworks Publisher. Since HTML does not support the rich formatting options
available in Framemaker, this document is only an approximation of the actual
chapter from the textbook.
If you are absolutely dying to get your hands on a version other than HTML,
you might consider having the UCR Printing a Reprographics Department run
you off a copy on their Xerox machines. For details, please read the following
EMAIL message I received from the Printing and Reprographics Department:
Hello Again Professor Hyde,
Dallas gave me permission to take orders for the Computer Science 13 Manuals.
We would need to take charge card orders. The only cards we take are: Master
Card, Visa, and Discover. They would need to send the name, numbers, expiration
date, type of card, and authorization to charge $95.00 for the manual and
shipping, also we should have their phone number in case the company has
any trouble delivery. They can use my e-mail address for the orders and
I will process them as soon as possible. I would assume that two weeks would
be sufficient for printing, packages and delivery time.
I am open to suggestions if you can think of any to make this as easy as
possible.
Thank You for your business,
Kathy Chapman, Assistant
Printing and Reprographics
University of California
Riverside
(909) 787-4443/4444
We are currently working on ways to publish this text in a form other than
HTML (e.g., Postscript, PDF, Frameviewer, hard copy, etc.). This, however,
is a low-priority project. Please do not contact Randall Hyde concerning
this effort. When something happens, an announcement will appear on "Randall
Hyde's Assembly Language Page." Please visit this WEB site at http://webster.ucr.edu
for the latest scoop.
Art of Assembly Bug Report Submissions
Did you find an error in The Art of Assembly Language Programming?
You can let me know by using the form below to report the error to me so
that I can correct the error for the next beta version. Thank you.
The Submission Form
Please provide your name and e-mail address so I can contact you if
I have any questions regarding your submission.
Chapter Eight MASM: Directives & Pseudo-Opcodes
Statements like mov ax,0
and add ax,bx
are
meaningless to the microprocessor. As arcane as these statements appear,
they are still human readable forms of 80x86 instructions. The 80x86 responds
to commands like B80000 and 03C3. An assembler is a program that converts
strings like mov ax,0
to 80x86 machine code like "B80000".
An assembly language program consists of statements like mov ax,0
.
The assembler converts an assembly language source file to machine code
- the binary equivalent of the assembly language program. In this respect,
the assembler program is much like a compiler, it reads an ASCII source
file from the disk and produces a machine language program as output. The
major difference between a compiler for a high level language (HLL) like
Pascal and an assembler is that the compiler usually emits several machine
instructions for each Pascal statement. The assembler generally emits a
single machine instruction for each assembly language statement.
Attempting to write programs in machine language (i.e., in binary) is not
particularly bright. This process is very tedious, prone to mistakes, and
offers almost no advantages over programming in assembly language. The only
major disadvantage to assembly language over pure machine code is that you
must first assemble and link a program before you can execute it. However,
attempting to assemble the code by hand would take far longer than the small
amount of time that the assembler takes to perform the conversion for you.
There is another disadvantage to learning assembly language. An assembler
like Microsoft's Macro Assembler (MASM) provides a large number of features
for assembly language programmers. Although learning about these features
takes a fair amount of time, they are so useful that it is well worth the
effort.
8.0 Chapter Overview
Like Chapter Six, much of the information in this chapter is reference
material. Like any reference section, some knowledge is essential, other
material is handy, but optional, and some material you may never use while
writing programs. The following list outlines the information in this text.
A "*" symbol marks the essential material. The "o" symbol
marks the optional and lesser used subjects.
* Assembly language statement source format
o The location counter
* Symbols and identifiers
* Constants
* Procedure declarations
o Segments in an assembly language program
* Variables
* Symbol types
* Address expressions (later subsections contain advanced material)
o Conditional assembly
o Macros
o Listing directives
o Separate assembly
8.1 Assembly Language Statements
Assembly language statements in a source file use the following format:
{Label} {Mnemonic {Operand}} {;Comment}
Each entity above is a field. The four fields above are the label field,
the mnemonic field, the operand field, and the comment field.
The label field is (usually) an optional field containing a symbolic label
for the current statement. Labels are used in assembly language, just as
in HLLs, to mark lines as the targets of GOTOs (jumps). You can also specify
variable names, procedure names, and other entities using symbolic labels.
Most of the time the label field is optional, meaning a label need be present
only if you want a label on that particular line. Some mnemonics, however,
require a label, others do not allow one. In general, you should always
begin your labels in column one (this makes your programs easier to read).
A mnemonic is an instruction name (e.g., mov
, add
,
etc.). The word mnemonic means memory aid. mov
is much easier
to remember than the binary equivalent of the mov
instruction!
The braces denote that this item is optional. Note, however, that you cannot
have an operand without a mnemonic.
The mnemonic field contains an assembler instruction. Instructions are divided
into three classes: 80x86 machine instructions, assembler directives, and
pseudo opcodes. 80x86 instructions, of course, are assembler mnemonics that
correspond to the actual 80x86 instructions introduced in Chapter Six.
Assembler directives are special instructions that provide information to
the assembler but do not generate any code. Examples include the segment
directive, equ
, assume
, and end
.
These mnemonics are not valid 80x86 instructions. They are messages to the
assembler, nothing else.
A pseudo-opcode is a message to the assembler, just like an assembler directive,
however a pseudo-opcode will emit object code bytes. Examples of pseudo-opcodes
include byte
, word
, dword
, qword
,
and tbyte
. These instructions emit the bytes of data specified
by their operands but they are not true 80X86 machine instructions.
The operand field contains the operands, or parameters, for the instruction
specified in the mnemonic field. Operands never appear on lines by themselves.
The type and number of operands (zero, one, two, or more) depend entirely
on the specific instruction.
The comment field allows you to annotate each line of source code in your
program. Note that the comment field always begins with a semicolon. When
the assembler is processing a line of text, it completely ignores everything
on the source line following a semicolon.
Each assembly language statement appears on its own line in the source file.
You cannot have multiple assembly language statements on a single line.
On the other hand, since all the fields in an assembly language statement
are optional, blank lines are fine. You can use blank lines anywhere in
your source file. Blank lines are useful for spacing out certain sections
of code, making them easier to read.
The Microsoft Macro Assembler is a free form assembler. The various fields
of an assembly language statement may appear in any column (as long as they
appear in the proper order). Any number of spaces or tabs can separate the
various fields in the statement. To the assembler, the following two code
sequences are identical:
______________________________________________________
mov ax, 0
mov bx, ax
add ax, dx
mov cx, ax
______________________________________________________
mov ax, 0
mov bx, ax
add ax, dx
mov cx, ax
______________________________________________________
The first code sequence is much easier to read than the second (if you don't
think so, perhaps you should go see a doctor!). With respect to readability,
the judicial use of spacing within your program can make all the difference
in the world.
Placing the labels in column one, the mnemonics in column 17 (two tabstops),
the operand field in column 25 (the third tabstop), and the comments out
around column 41 or 49 (five or six tabstops) produces the best looking
listings. Assembly language programs are hard enough to read as it is. Formatting
your listings to help make them easier to read will make them much easier
to maintain.
You may have a comment on the line by itself. In such a case, place the
semicolon in column one and use the entire line for the comment, examples:
; The following section of code positions the cursor to the upper
; left hand position on the screen:
mov X, 0
mov Y, 0
; Now clear from the current cursor position to the end of the
; screen to clear the video display:
; etc.
8.2 The Location Counter
Recall that all addresses in the 80x86's memory space consist of a segment
address and an offset within that segment. The assembler, in the process
of converting your source file into object code, needs to keep track of
offsets within the current segment. The location counter is an assembler
variable that handles this.
Whenever you create a segment in your assembly language source file (see
segments later in this chapter), the assembler associates the current location
counter value with it. The location counter contains the current offset
into the segment. Initially (when the assembler first encounters a segment)
the location counter is set to zero. When encountering instructions or pseudo-opcodes,
MASM increments the location counter for each byte written to the object
code file. For example, MASM increments the location counter by two after
encountering mov ax, bx
since this instruction is two bytes
long.
The value of the location counter varies throughout the assembly process.
It changes for each line of code in your program that emits object code.
We will use the term location counter to mean the value of the location
counter at a particular statement before generating any code. Consider the
following assembly language statements:
0 : or ah, 9
3 : and ah, 0c9h
6 : xor ah, 40h
9 : pop cx
A : mov al, cl
C : pop bp
D : pop cx
E : pop dx
F : pop ds
10: ret
The or
, and
, and xor
instructions
are all three bytes long; the mov
instruction is two bytes
long; the remaining instructions are all one byte long. If these instructions
appear at the beginning of a segment, the location counter would be the
same as the numbers that appear immediately to the left of each instruction
above. For example, the or
instruction above begins at offset
zero. Since the or
instruction is three bytes long, the next
instruction (and
) follows at offset three. Likewise, and
is three bytes long, so xor
follows at offset six, etc..
8.3 Symbols
Consider the jmp
instruction for a moment. This instruction
takes the form:
jmp target
Target is the destination address. Imagine how painful it would be if you
had to actually specify the target memory address as a numeric value. If
you've ever programmed in BASIC (where line numbers are the same thing as
statement labels) you've experienced about 10% of the trouble you would
have in assembly language if you had to specify the target of a jmp
by an address.
To illustrate, suppose you wanted to jump to some group of instructions
you've yet to write. What is the address of the target instruction? How
can you tell until you've written every instruction before the target instruction?
What happens if you change the program (remember, inserting and deleting
instructions will cause the location counter values for all the following
instructions within that segment to change). Fortunately, all these problems
are of concern only to machine language programmers. Assembly language programmers
can deal with addresses in a much more reasonable fashion - by using symbolic
addresses.
A symbol, identifier, or label , is a name associated with some particular
value. This value can be an offset within a segment, a constant, a string,
a segment address, an offset within a record, or even an operand for an
instruction. In any case, a label provides us with the ability to represent
some otherwise incomprehensible value with a familiar, mnemonic, name.
A symbolic name consists of a sequence of letters, digits, and special characters,
with the following restrictions:
- A symbol cannot begin with a numeric digit.
- A name can have any combination of upper and lower case alphabetic characters.
The assembler treats upper and lower case equivalently.
- A symbol may contain any number of characters, however only the first
31 are used. The assembler ignores all characters beyond the 31st.
- The _, $, ?, and @ symbols may appear anywhere within a symbol. However,
$ and ? are special symbols; you cannot create a symbol made up solely of
these two characters.
- A symbol cannot match any name that is a reserved symbol. The following
symbols are reserved:
%out .186 .286 .286P
.287 .386 .386P .387
.486 .486P .8086 .8087
.ALPHA .BREAK .CODE .CONST
.CREF .DATA .DATA? .DOSSEG
.ELSE .ELSEIF .ENDIF .ENDW
.ERR .ERR1 .ERR2 .ERRB
.ERRDEF .ERRDIF .ERRDIFI .ERRE
.ERRIDN .ERRIDNI .ERRNB .ERRNDEF
.ERRNZ .EXIT .FARDATA .FARDATA?
.IF .LALL .LFCOND .LIST
.LISTALL .LISTIF .LISTMACRO .LISTMACROALL
.MODEL .MSFLOAT .NO87 .NOCREF
.NOLIST .NOLISTIF .NOLISTMACRO .RADIX
.REPEAT .UNTIL .SALL .SEQ
.SFCOND .STACK .STARTUP .TFCOND
.UNTIL .UNTILCXZ .WHILE .XALL
.XCREF .XLIST ALIGN ASSUME
BYTE CATSTR COMM COMMENT
DB DD DF DOSSEG
DQ DT DW DWORD
ECHO ELSE ELSEIF ELSEIF1
ELSEIF2 ELSEIFB ELSEIFDEF ELSEIFDEF
ELSEIFE ELSEIFIDN ELSEIFNB ELSEIFNDEF
END ENDIF ENDM ENDP
ENDS EQU EVEN EXITM
EXTERN EXTRN EXTERNDEF FOR
FORC FWORD GOTO GROUP
IF IF1 IF2 IFB
IFDEF IFDIF IFDIFI IFE
IFIDN IFIDNI IFNB IFNDEF
INCLUDE INCLUDELIB INSTR INVOKE
IRP IRPC LABEL LOCAL
MACRO NAME OPTION ORG
PAGE POPCONTEXT PROC PROTO
PUBLIC PURGE PUSHCONTEXT QWORD
REAL4 REAL8 REAL10 RECORD
REPEAT REPT SBYTE SDWORD
SEGMENT SIZESTR STRUC STRUCT
SUBSTR SUBTITLE SUBTTL SWORD
TBYTE TEXTEQU TITLE TYPEDEF
UNION WHILE WORD
In addition, all valid 80x86 instruction names and register names are reserved
as well. Note that this list applies to Microsoft's Macro Assembler version
6.0. Earlier versions of the assembler have fewer reserved words. Later
versions may have more.
Some examples of valid symbols include:
L1 Bletch RightHere
Right_Here Item1 __Special
$1234 @Home $_@1
Dollar$ WhereAmI? @1234
$1234 and @1234 are perfectly valid, strange though they may seem.
Some examples of illegal symbols include:
1TooMany - Begins with a digit.
Hello.There - Contains a period in the middle of the symbol.
$ - Cannot have $ or ? by itself.
LABEL - Assembler reserved word.
Right Here - Symbols cannot contain spaces.
Hi,There - or other special symbols besides _, ?, $, and @.
Symbols, as mentioned previously, can be assigned numeric values (such as
location counter values), strings, or even whole operands. To keep things
straightened out, the assembler assigns a type to each symbol. Examples
of types include near, far, byte, word, double word, quad word, text, and
strings. How you declare labels of a certain type is the subject of much
of the rest of this chapter. For now, simply note that the assembler always
assigns some type to a label and will tend to complain if you try to use
a label at some point where it does not allow that type of label.
8.4 Literal Constants
The Microsoft Macro Assembler (MASM) is capable of processing five different
types of constants: integers, packed binary coded decimal integers, real
numbers, strings, and text. In this chapter we'll consider integers, reals,
strings, and text only. For more information about packed BCD integers please
consult the Microsoft Macro Assembler Programmer's Guide.
A literal constant is one whose value is implicit from the characters that
make up the constant. Examples of literal constants include:
- 123
- 3.14159
- "Literal String Constant"
- 0FABCh
- 'A'
- <Text Constant>
Except for the last example above, most of these literal constants should
be reasonably familiar to anyone who has written a program in a high level
language like Pascal or C++. Text constants are special forms of strings
that allow textual substitution during assembly.
A literal constant's representation corresponds to what we would normally
expect for its "real world value." Literal constants are also
known as non symbolic constants since they use the value's actual representation,
rather than some symbolic name, within your program. MASM also lets you
define symbolic, or manifest, constants in a program, but more on that later.
8.4.1 Integer Constants
An integer constant is a numeric value that can be specified in binary,
decimal, or hexadecimal. The choice of the base (or radix) is up to you.
The following table shows the legal digits for each radix:
Digits Used With Each Radix Name | Base | Valid
Digits |
---|
Binary | 2 | 0 1 |
Decimal | 10 | 0 1 2 3 4 5 6
7 8 9 |
Hexadecimal | 16 | 0 1 2 3 4 5 6 7 8 9 A B C D E F |
To differentiate between numbers in the various bases, you use a suffix
character. If you terminate a number with a "b" or "B",
then MASM assumes that it is a binary number. If it contains any digits
other than zero or one the assembler will generate an error. If the suffix
is "t", "T", "d" or "D", then the
assembler assumes that the number is a decimal (base 10) value. A suffix
of "h" or "H" will select the hexadecimal radix.
All integer constants must begin with a decimal digit, including hexadecimal
constants. To represent the value "FDED" you must specify 0FDEDh.
The leading decimal digit is required by the assembler so that it can differentiate
between symbols and numeric constants; remember, "FDEDh" is a
perfectly valid symbol to the Microsoft Macro Assembler.
Examples:
0F000h 12345d 0110010100b
1234h 100h 08h
If you do not specify a suffix after your numeric constants, the assembler
uses the current default radix. Initially, the default radix is decimal.
Therefore, you can usually specify decimal values without the trailing "D"
character. The radix
assembler directive can be used to change
the default radix to some other base. The .radix
instruction
takes the following form:
.radix base ;Optional comment
Base is a decimal value between 2 and 16.
The .radix
statement takes effect as soon as MASM encounters
it in the source file. All the statements before the .radix
statement will use the previous default base for numeric constants. By sprinkling
multiple .radix
instructions throughout your source file, you
can switch the default base amongst several values depending upon what's
most convenient at each point in your program.
Generally, decimal is fine as the default base so the .radix
instruction doesn't get used much. However, faced with entering a gigantic
table of hexadecimal values, you can save a lot of typing by temporarily
switching to base 16 before the table and switching back to decimal after
the table. Note: if the default radix is hexadecimal, you should use the
"T" suffix to denote decimal values since MASM will confuse the
"D" suffix with a hexadecimal digit.
8.4.2 String Constants
A string constant is a sequence of characters surrounded by apostrophes
or quotation marks.
Examples:
"This is a string"
'So is this'
You may freely place apostrophes inside string constants enclosed by quotation
marks and vice versa. If you want to place an apostrophe inside a string
delimited by apostrophes, you must place a pair of apostrophes next to each
other in the string, e.g.,
'Doesn''t this look weird?'
Quotation marks appearing within a string delimited by quotes must also
be doubled up, e.g.,
"Microsoft claims ""Our software is very fast."" Do you believe them?"
Although you can double up apostrophes or quotes as shown in the examples
above, the easiest way to include these characters in a string is to use
the other character as the string delimiter:
"Doesn't this look weird?"
'Microsoft claims "Our software is very fast." Do you believe them?'
The only time it would be absolutely necessary to double up quotes or apostrophes
in a string is if that string contained both symbols. This rarely happens
in real programs.
Like the C and C++ programming languages, there is a subtle difference between
a character value and a string value. A single character (that is, a string
of length one) may appear anywhere MASM allows an integer constant or a
string. If you specify a character constant where MASM expects an integer
constant, MASM uses the ASCII code of that character as the integer value.
Strings (whose length is greater than one) are allowed only within certain
contexts.
8.4.3 Real Constants
Within certain contexts, you can use floating point constants. MASM
allows you to express floating point constants in one of two forms: decimal
notation or scientific notation. These forms are quite similar to the format
for real numbers that Pascal, C, and other HLLs use.
The decimal form is just a sequence of digits containing a decimal point
in some position of the number:
1.0 3.14159 625.25 -128.0 0.5
Scientific notation is also identical to the form used by various HLLs:
1e5 1.567e-2 -6.02e-10 5.34e+12
The exact range of precision of the numbers depend on your particular floating
point package. However, MASM generally emits binary data for the above constants
that is compatible with the 80x87 numeric coprocessors. This form corresponds
to the numeric format specified by the IEEE standard for floating point
values. In particular, the constant 1.0 is not the binary equivalent of
the integer one.
8.4.4 Text Constants
Text constants are not the same thing as string constants. A textual
constant substitutes verbatim during the assembly process. For example,
the characters 5[bx]
could be a textual constant associated
with the symbol VAR1. During assembly, an instruction of the form mov
ax, VAR1
would be converted to the instruction mov ax, 5[bx]
.
Textual equates are quite useful in MASM because MASM often insists on long
strings of text for some simple assembly language operands. Using text equates
allows you to simplify such operands by substituting some string of text
for a single identifier in a statement.
A text constant consists of a sequence of characters surrounded by the "<"
and ">" symbols. For example the text constant 5[bx]
would normally be written as <5[bx]>
. When the
text substitution occurs, MASM strips the delimiting "<" and
">" characters.
8.5 Declaring Manifest Constants Using Equates
A manifest constant is a symbol name that represents some fixed quantity
during the assembly process. That is, it is a symbolic name that represents
some value. Equates are the mechanism MASM uses to declare symbolic constants.
Equates take three basic forms:
symbol equ expression
symbol = expression
symbol textequ expression
The expression operand is typically a numeric expression or a text string.
The symbol is given the value and type of the expression. The equ
and "=
" directives have been with MASM since the
beginning. Microsoft added the textequ
directive starting with
MASM 6.0.
The purpose of the "=" directive is to define symbols that have
an integer (or single character) quantity associated with them. This directive
does not allow real, string, or text operands. This is the primary directive
you should use to create numeric symbolic constants in your programs. Some
examples:
NumElements = 16
.
.
.
Array byte NumElements dup (?)
.
.
.
mov cx, NumElements
mov bx, 0
ClrLoop: mov Array[bx], 0
inc bx
loop ClrLoop
The textequ
directive defines a text substitution symbol. The
expression in the operand field must be a text constant delimited with the
"<" and ">" symbols. Whenever MASM encounters
the symbol within a statement, it substitutes the text in the operand field
for the symbol. Programmers typically use this equate to save typing or
to make some code more readable:
Count textequ <6[bp]>
DataPtr textequ <8[bp]>
.
.
.
les bx, DataPtr ;Same as les bx, 8[bp]
mov cx, Count ;Same as mov cx, 6[bp]
mov al, 0
ClrLp: mov es:[bx], al
inc bx
loop ClrLp
Note that it is perfectly legal to equate a symbol to a blank operand using
an equate like the following:
BlankEqu textequ <>
The purpose of such an equate will become clear in the sections on conditional
assembly and macros.
The equ
directive provides almost a superset of the capabilities
of the "=" and textequ
directives. It allows operands
that are numeric, text, or string literal constants. The following are all
legal uses of the equ directive:
One equ 1
Minus1 equ -1
TryAgain equ 'Y'
StringEqu equ "Hello there"
TxtEqu equ <4[si]>
.
.
.
HTString byte StringEqu ;Same as HTString equ "Hello there"
.
.
.
mov ax, TxtEqu ;Same as mov ax, 4[si]
.
.
.
mov bl, One ;Same as mov bl, 1
cmp al, TryAgain ;Same as cmp al, 'Y'
Manifest constants you declare with equates help you parameterize a program.
If you use the same value, string, or text, multiple times within a program,
using a symbolic equate will make it very easy to change that value in future
modifications to the program. Consider the following example:
Array byte 16 dup (?)
.
.
.
mov cx, 16
mov bx, 0
ClrLoop: mov Array[bx], 0
inc bx
loop ClrLoop
If you decide you want Array to have 32 elements rather than 16, you will
need to search throughout your program an locate every reference to this
data and adjust the literal constants accordingly. Then there is the possibility
that you missed modifying some particular section of code, introducing a
bug into your program. On the other hand, if you use the NumElements
symbolic constant shown earlier, you would only have to change a single
statement in your program, reassemble it, and you would be in business;
MASM would automatically update all references using NumElements
.
MASM lets you redefine symbols declared with the "=" directive.
That is, the following is perfectly legal:
SomeSymbol = 0
.
.
.
SomeSymbol = 1
Since you can change the value of a constant in the program, the symbol's
scope (where the symbol has a particular value) becomes important. If you
could not redefine a symbol, one would expect the symbol to have that constant
value everywhere in the program. Given that you can redefine a constant,
a symbol's scope cannot be the entire program. The solution MASM uses is
the obvious one, a manifest constant's scope is from the point it is defined
to the point it is redefined. This has one important ramification - you
must declare all manifest constants with the "="
directive
before you use that constant. Of course, once you redefine a symbolic constant,
the previous value of that constant is forgotten. Note that you cannot redefine
symbols you declare with the textequ
or equ
directives.
8.6 Processor Directives
By default, MASM will only assemble instructions that are available
on all members of the 80x86 family. In particular, this means it will not
assemble instructions that are not available on the 8086 and 8088 microprocessors.
By generating an error for non-8086 instructions, MASM prevents the accidental
use of instructions that are not available on various processors. This is
great unless, of course, you actually want to use those instructions available
on processors beyond the 8086 and 8088. The processor directives let you
enable the assembly of instructions available on later processors.
The processor directives are
.8086 .8087 .186 .286 .287
.286P .386 .387 .386P .486
.486P .586 .586P
None of these directives accept any operands.
The processor directives enable all instructions available on a given processor.
Since the 80x86 family is upwards compatible, specifying a particular processor
directive enables all instructions on that processor and all earlier processors
as well.
The .8087
, .287
, and .387
directives
activate the floating point instruction set for the given floating point
coprocessors. However, the .8086
directive also enables the
8087 instruction set; likewise, .286
enables the 80287 instruction
set and .386
enables the 80387 floating point instruction set.
About the only purpose for these FPU (floating point unit) directives is
to allow 80287 instructions with the 8086 or 80186 instruction set or 80387
instruction with the 8086, 80186, or 80286 instruction set.
The processor directives ending with a "P" allow assembly of privileged
mode instructions. Privileged mode instructions are only useful to those
writing operating systems, certain device drivers, and other advanced system
routines. Since this text does not discuss privileged mode instructions,
there is little need to discuss these privileged mode directives further.
The 80386 and later processors support two types of segments when operating
in protected mode - 16 bit segments and 32 bit segments. In real mode, these
processors support only 16 bit segments. The assembler must generate subtly
different opcodes for 16 and 32 bit segments. If you've specified a 32 bit
processor using .386, .486,
or .586,
MASM generates
instructions for 32 bit segments by default. If you attempt to run such
code in real mode under MS-DOS, you will probably crash the system. There
are two solutions to this problem. The first is to specify use16 as an operand
to each segment you create in your program. The other solution is slightly
more practical, simply put the following statement after the 32 bit processor
directive:
option segment:use16
This directive tells MASM to generate 16 bit segments by default, rather
than 32 bit segments.
Note that MASM does not require an 80486 or Pentium processor if you specify
the .486
or .586
directives. The assembler itself
is written in 80386 code so you only need an 80386 processor to assemble
any program with MASM. Of course, if you use 80486 or Pentium processor
specific instructions, you will need an 80486 or Pentium processor to run
the assembled code.
You can selectively enable or disable various instruction sets throughout
your program. For example, you can turn on 80386 instructions for several
lines of code and then return back to 8086 only instructions. The following
code sequence demonstrates this:
.386 ;Begin using 80386 instructions
.
. ;This code can have 80386 instrs.
.
.8086 ;Return back to 8086-only instr set.
.
. ;This code can only have 8086 instrs.
.
It is possible to write a routine that detects, at run-time, what processor
a program is actually running on. Therefore, you can detect an 80386 processor
and use 80386 instructions. If you do not detect an 80386 processor, you
can stick with 8086 instructions. By selectively turning 80386 instructions
on in those sections of your program that executes if an 80386 processor
is present, you can take advantage of the additional instructions. Likewise,
by turning off the 80386 instruction set in other sections of your program,
you can prevent the inadvertent use of 80386 instructions in the 8086-only
portion of the program.
8.7 Procedures
Unlike HLLs, MASM doesn't enforce strict rules on exactly what constitutes
a procedure. You can call a procedure at any address in memory. The first
ret
instruction encountered along that execution path terminates
the procedure. Such expressive freedom, however, is often abused yielding
programs that are very hard to read and maintain. Therefore, MASM provides
facilities to declare procedures within your code. The basic mechanism for
declaring a procedure is:
procname proc {NEAR or FAR}
<statements>
procname endp
As you can see, the definition of a procedure looks similar to that for
a segment. One difference is that procname (that is the name of the procedure
you're defining) must be a unique identifier within your program. Your code
calls this procedure using this name, it wouldn't do to have another procedure
by the same name; if you did, how would the program determine which routine
to call?
Proc
allows several different operands, though we will only
consider three: the single keyword near
, the single keyword
far
, or a blank operand field. MASM uses these operands to
determine if you're calling this procedure with a near
or far
call instruction. They also determine which type of ret
instruction
MASM emits within the procedure. Consider the following two procedures:
NProc proc near
mov ax, 0
ret
NProc endp
FProc proc far
mov ax, 0FFFFH
ret
FProc endp
and:
call NPROC
call FPROC
The assembler automatically generates a three-byte (near) call for the first
call
instruction above because it knows that NProc
is
a near procedure. It also generates a five-byte (far) call
instruction for the second call
because FProc
is
a far procedure. Within the procedures themselves, MASM automatically converts
all ret
instructions to near or far returns depending on the
type of routine.
Note that if you do not terminate a proc/endp
section with
a ret
or some other transfer of control instruction and program
flow runs into the endp
directive, execution will continue
with the next executable instruction following the endp
. For
example, consider the following:
Proc1 proc
mov ax, 0
Proc1 endp
Proc2 proc
mov bx, 0FFFFH
ret
Proc2 endp
If you call Proc1
, control will flow on into Proc2
starting with the mov bx,0FFFFh
instruction. Unlike high level
language procedures, an assembly language procedure does not contain an
implicit return instruction before the endp
directive. So always
be aware of how the proc/endp
directives work.
There is nothing special about procedure declarations. They're a convenience
provided by the assembler, nothing more. You could write assembly language
programs for the rest of your life and never use the proc
and
endp
directives. Doing so, however, would be poor programming
practice. Proc
and endp
are marvelous documentation
features which, when properly used, can help make your programs much easier
to read and maintain.
MASM versions 6.0 and later treat all statement labels inside a procedure
as local. That is, you cannot refer directly to those symbols outside the
procedure. For more details, see "How
to Give a Symbol a Particular Type" on page 385.
- 8.0 - Chapter Overview
- 8.1 - Assembly Language Statements
- 8.2 - The Location Counter
- 8.3 - Symbols
- 8.4 - Literal Constants
- 8.4.1 - Integer Constants
- 8.4.2 - String Constants
- 8.4.3 - Real Constants
- 8.4.4 - Text Constants
- 8.5 - Declaring Manifest Constants Using
Equates
- 8.6 - Processor Directives
- 8.7 - Procedures
- 8.8 - Segments
- 8.8.1 - Segment Names
- 8.8.2 - Segment Loading Order
- 8.8.3 - Segment Operands
- 8.8.3.1 - The ALIGN Type
- 8.8.3.2 - The COMBINE Type
- 8.8.4 - The CLASS Type
- 8.8.5 - The Read-only Operand
- 8.8.6 - The USE16, USE32, and
FLAT Options
- 8.8.7 - Typical Segment Definitions
- 8.8.8 - Why You Would Want
to Control the Loading Order
- 8.8.9 - Segment Prefixes
- 8.8.10 - Controlling Segments
with the ASSUME Directive
- 8.8.11 - Combining Segments:
The GROUP Directive
- 8.8.12 - Why Even Bother With
Segments?
- 8.9 - The END Directive
- 8.10 - Variables
- 8.11 - Label Types
- 8.11.1 - How to Give a Symbol
a Particular Type
- 8.11.2 - Label Values
- 8.11.3 - Type Conflicts
- 8.12 - Address Expressions
- 8.12.1 - Symbol Types and Addressing
Modes
- 8.12.2 - Arithmetic and Logical
Operators
- 8.12.3 - Coercion
- 8.12.4 - Type Operators
- 8.12.5 - Operator Precedence
- 8.13 - Conditional Assembly
- 8.13.1 - IF Directive
- 8.13.2 - IFE directive
- 8.13.3 - IFDEF and IFNDEF
- 8.13.4 - IFB, IFNB
- 8.13.5 - IFIDN, IFDIF, IFIDNI,
and IFDIFI
- 8.14 - Macros
- 8.14.1 - Procedural Macros
- 8.14.2 - Macros vs. 80x86
Procedures
- 8.14.3 - The LOCAL Directive
- 8.14.4 - The EXITM Directive
- 8.14.5 - Macro Parameter Expansion
and Macro Operators
- 8.14.6 - A Sample Macro to
Implement For Loops
- 8.14.7 - Macro Functions
- 8.14.8 - Predefined Macros,
Macro Functions, and Symbols
- 8.14.9 - Macros vs. Text Equates
- 8.14.10 - Macros: Good and
Bad News
- 8.15 - Repeat Operations
- 8.16 - The FOR and FORC Macro
Operations
- 8.17 - The WHILE Macro Operation
- 8.18 - Macro Parameters
- 8.19 - Controlling the Listing
- 8.19.1 - The ECHO and %OUT
Directives
- 8.19.2 - The TITLE Directive
- 8.19.3 - The SUBTTL Directive
- 8.19.4 - The PAGE Directive
- 8.19.5 - The .LIST, .NOLIST,
and .XLIST Directives
- 8.19.6 - Other Listing Directives
- 8.20 - Managing Large Programs
- 8.20.1 - The INCLUDE Directive
- 8.20.2 - The PUBLIC, EXTERN,
and EXTRN Directives
- 8.20.3 - The EXTERNDEF Directive
- 8.21 - Make Files
- 8.22 - Sample Program
- 8.22.1 - EX8.MAK
- 8.22.2 - Matrix.A
- 8.22.3 - EX8.ASM
- 8.22.4 - GETI.ASM
- 8.22.5 - GetArray.ASM
- 8.22.6 - XProduct.ASM
Art of Assembly: Chapter Eight - 26 SEP 1996
[Next] [Art of Assembly][Randall
Hyde]